24 research outputs found

    Digital cultural heritage and revitalization of endangered Finno-Ugric languages

    Get PDF
    The preservation of linguistic diversity has long been recognized as a crucial, integral part of supporting our cultural heritage. Yet many “minority” languages—those that lack official state status—are in decline, many severely endangered. We present a prototype system aimed at “heritage” speakers of endangered Finno-Ugric languages. Heritage speakers are people who have heard the language used by the older generations while they were growing up, and who possess a considerable passive competency—well beyond the “beginner” level,—but are lacking in active fluency. Our system is based on natural language processing and artificial intelligence. It assists the learners by allowing them to learn from arbitrary texts of their choice, and by creating exercises that engage them in active production of language—rather than in passive memorization of material. Continuous automatic assessment helps guide the learner toward improved fluency. We believe that providing such AI-based tools will help bring these languages to the forefront of the modern digital age, raise prestige, and encourage the younger generations to become involved in reversal of language decline.Peer reviewe

    Assessing Grammatical Correctness in Language Learning

    Get PDF
    We present experiments on assessing the grammatical correctness of learners’ answers in a language-learning System (references to the System, and the links to the released data and code are withheld for anonymity). In particular, we explore the problem of detecting alternative-correct answers: when more than one inflected form of a lemma fits syntactically and semantically in a given context. We approach the problem with the methods for grammatical error detection (GED), since we hypothesize that models for detecting grammatical mistakes can assess the correctness of potential alternative answers in a learning setting. Due to the paucity of training data, we explore the ability of pre-trained BERT to detect grammatical errors and then fine-tune it using synthetic training data. In this work, we focus on errors in inflection. Our experiments show a. that pre-trained BERT performs worse at detecting grammatical irregularities for Russian than for English; b. that fine-tuned BERT yields promising results on assessing the correctness of grammatical exercises; and c. establish a new benchmark for Russian. To further investigate its performance, we compare fine-tuned BERT with one of the state-of-the-art models for GED (Bell et al., 2019) on our dataset and RULEC-GEC (Rozovskaya and Roth, 2019). We release the manually annotated learner dataset, used for testing, for general use.Peer reviewe

    Revita: a Language-learning Platform at the Intersection of ITS and CALL

    Get PDF
    This paper presents Revita, a Web-based platform for language learning—beyond the beginner level. We anchor the presentation in a survey, where we review the literature about recent advances in the fields of computer-aided language learning (CALL) and intelligent tutoring systems (ITS). We outline the established desiderata of CALL and ITS and discuss how Revita addresses (the majority of) the theoretical requirements of CALL and ITS. Finally, we claim that, to the best of our knowledge, Revita is currently the only platform for learning/tutoring beyond the beginner level, that is functional, freely-available and supports multiple languages.Peer reviewe

    Multiple Admissibility in Language Learning: : Judging Grammaticality using Unlabeled Data

    Get PDF
    We present our work on the problem of detection Multiple Admissibility (MA) in language learning. Multiple Admissibility occurs when more than one grammatical form of a word fits syntactically and semantically in a given context. In second-language education—in particular, in intelligent tutoring systems/computer-aided language learning (ITS/CALL), systems generate exercises automatically. MA implies that multiple alternative answers are possible. We treat the problem as a grammaticality judgement task. We train a neural network with an objective to label sentences as grammatical or ungrammatical, using a "simulated learner corpus": a dataset with correct text and with artificial errors, generated automatically. While MA occurs commonly in many languages, this paper focuses on learning Russian. We present a detailed classification of the types of constructions in Russian, in which MA is possible, and evaluate the model using a test set built from answers provided by users of the Revita language learning system.Peer reviewe

    Revita: a System for Language Learning and Supporting Endangered Languages

    Get PDF
    We describe a computational system for language learning and supporting endangered languages. The platform provides the user an opportunity to improve her competency through active language use. The platform currently works with several endangered Finno-Ugric languages, as well as with Yakut, and Finnish, Swedish, and Russian. This paper describes the current stage of ongoing development.Peer reviewe

    Semi-automatically Annotated Learner Corpus for Russian

    Get PDF
    We present ReLCo— the Revita Learner Corpus—a new semi-automatically annotated learner corpus for Russian. The corpus was collected while several hundreds L2 learners were performing exercises using the Revita language-learning system. All errors were detected automatically by the system and annotated by type. Part of the corpus was annotated manually—this part was created for further experiments on automatic assessment of grammatical correctness. The Learner Corpus provides valuable data for studying patterns of grammatical errors, experimenting with grammatical error detection and grammatical error correction, and developing new exercises for language learners. Automating the collection and annotation makes the process of building the learner corpus much cheaper and faster, in contrast to the traditional approach of building learner corpora. We make the data publicly available.Peer reviewe

    Tools for supporting language learning for Sakha

    Get PDF
    This paper presents an overview of linguistic resources available for the Sakha language, and presents new tools for supporting language learning for Sakha. The essential resources include a morphological analyzer, digital dictionaries, and corpora of Sakha texts. We extended an earlier version of the morphological analyzer/transducer, built on the Apertium finite-state platform. The analyzer currently has an adequate level of coverage, between 86% and 89% on two Sakha corpora. Based on these resources, we implement a language-learning environment for Sakha in the Revita computer-assisted language learning (CALL) platform. Revita is a freely available online language learning platform for learners beyond the beginner level. We describe the tools for Sakha currently integrated into the Revita platform. To our knowledge, at present this is the first large-scale project undertaken to support intermediate-advanced learners of a minority Siberian language.Peer reviewe

    Grouping business news stories based on salience of named entities

    Get PDF
    In news aggregation systems focused on broad news domains, certain stories may appear in multiple articles. Depending on the relative importance of the story, the number of versions can reach dozens or hundreds within a day. The text in these versions may be nearly identical or quite different. Linking multiple versions of a story into a single group brings several important benefits to the end-user—reducing the cognitive load on the reader, as well as signaling the relative importance of the story. We present a grouping algorithm, and explore several vector-based representations of input documents: from a baseline using keywords, to a method using salience—a measure of importance of named entities in the text. We demonstrate that features beyond keywords yield substantial improvements, verified on a manually-annotated corpus of business news stories.Peer reviewe

    Applying gamification incentives in the Revita language-learning system

    Get PDF
    We explore the importance of gamification features in a language-learning platform designed for intermediate-to-advanced learners. Our main thesis is: learning toward advanced levels requires a massive investment of time. If the learner engages in more practice sessions, and if the practice sessions are longer, we can expect the results to be better. This principle appears to be tautologically self-evident. Yet, keeping the learner engaged in general—and building gamification features in particular—requires substantial efforts on the part of developers. Our goal is to keep the learner engaged in long practice sessions over many months—rather than for the short-term. In academic research on language learning, resources are typically scarce, and gamification usually is not considered an essential priority for allocating resources. We argue in favor of giving serious consideration to gamification in the language-learning setting—as a means of enabling in-depth research. In this paper, we introduce several gamification incentives in the Revita language-learning platform. We discuss the problems in obtaining quantitative measures of the effectiveness of gamification features.Peer reviewe

    v-trel: Vocabulary Trainer for Tracing Word Relations : An Implicit Crowdsourcing Approach

    Get PDF
    In this paper, we present our work on developing a vocabulary trainer that uses exercises generated from language resources such as ConceptNet and crowdsources the responses of the learners to enrich the language resource. We performed an empirical evaluation of our approach with 60 non-native speakers over two days, which shows that new entries to expand Concept-Net can efficiently be gathered through vocabulary exercises on word relations. We also report on the feedback gathered from the users and an expert from language teaching, and discuss the potential of the vocabulary trainer application from the user and language learner perspective. The feedback suggests that v-trel has educational potential, while in its current state some shortcomings could be identified.Peer reviewe
    corecore